LP Residual Features for Robust, Privacy-Sensitive Speaker Diarization
نویسندگان
چکیده
We present a comprehensive study of linear prediction residual for speaker diarization on single and multiple distant microphone conditions in privacy-sensitive settings, a requirement to analyze a wide range of spontaneous conversations. Two representations of the residual are compared, namely real-cepstrum and MFCC, with the latter performing better. Experiments on RT06eval show that residual with subband information from 2.5 kHz to 3.5 kHz and spectral slope yields a performance close to traditional MFCC features. As a way to objectively evaluate privacy in terms of linguistic information, we perform phoneme recognition. Residual features yield low phoneme accuracies compared to traditional MFCC features.
منابع مشابه
Multimodal speaker diarization using oriented optical flow histograms
Speaker diarization is the task of partitioning an input stream into speaker homogeneous regions, or in other words, to determine ”who spoke when.” While approaches to this problem have traditionally relied entirely on the audio stream, the availability of accompanying video streams in recent diarization corpora has prompted the study of methods based on multimodal audio-visual features. In thi...
متن کاملDiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings
The speaker diarization task consists of inferring “who spoke when” in an audio stream without any prior knowledge and has been object of several NIST international evaluation campaigns is last years. A common trend for improving performances has been the use of several different feature streams as diverse as speaker location features, visual features or noise robust acoustic features. This pap...
متن کاملUsing Weighted Oriented Optical Flow Histograms for Multimodal Speaker Diarization
Speaker diarization currently focuses on using audio features to partition an audio stream into speaker homogeneous speech regions, in other words to determine “who spoke when”. Recent speaker diarization corpora contains video recordings in addition to the commonly used audio. Thus, we investigated the benefits of incorporating video features, namely histograms of weighted oriented optical flo...
متن کاملIntegration of TDOA features in information bottleneck framework for fast speaker diarization
In this paper we address the combination of multiple feature streams in a fast speaker diarization system for meeting recordings. Whenever Multiple Distant Microphones (MDM) are used, it is possible to estimate the Time Delay of Arrival (TDOA) for different channels. In [9], it is shown that TDOA can be used as additional features together with conventional spectral features for improving speak...
متن کاملRobust speaker diarization for meetings: ICSI RT06s evaluation system
In this paper we present the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) [1] conducted on the meetings environment. This is a set of yearly evaluations which in the last two years have included speaker diarization of two kinds of distinct meetings: conference room and lecture room. The system presented focuses on being robust to changes in the me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011